-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON schema validation proposal #5852
base: main
Are you sure you want to change the base?
Conversation
ecbb00a
to
2504ece
Compare
test-json.py uses jsonschema (https://pypi.org/project/jsonschema/) to validate JSON files against schemas, optionally with custom format checkers. All JSON files in the project must be accounted for in test-json.schema_mapping(), either as a schema, or as a file to be validated against a schema. The JSON schemas are implicitly validated against the most recent metaschema available in jsonschema. This commit adds schemas for JSON addon files and for namingng config files. TODO: - runtime schema validation - remove current ad-hoc validation - generate documentation from schemas
ed34bb0
to
022a7a8
Compare
For JSON schema validation in cpp code, valijson (https://github.com/tristanpenman/valijson) is added. It supports picojson and is actively maintained. It is used in one-header-include style, because it seems the simplest solution, for now at least. The header is added, alongside the LICENSE file, in externals/valijson/. Various warnings that are emitted compiling the default code generated by valijson are fixed; these fixes are pushed upstream. As JSON is now validated against a schema, various ad-hoc checks could be safely removed. A simple way to test addon JSON schema validation, is to invoke cppcheck using: cppcheck --addon='{"script":1}' test.c A minor bugfix is that the file specified by "executable" was searched using getFullPath() with fileName as the second argument; it looks like this should have been exename. The fileName argument of parseAddonInfo() is only used for log messages, and is now replaced by the string "inline JSON" instead of the actual JSON string that was parsed. The command as given above will therefore output: Loading inline JSON failed. JSON schema validation failed: <root> [script] Value type not permitted by 'type' constraint., <root> Failed to validate against schema associated with property name 'script'. The addon JSON unit tests are also updated and now operate on JSON specified on the command line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. I am not against validating json in theory.. and you seem to have found a reasonable library. the schema is some standard format I assume?
the output messages are more convoluted as far as I see.
on the other hand we do need to improve our json validation..
|
||
|
||
def test_addon_json_invalid_args_1(tmpdir): | ||
__test_addon_json_invalid(tmpdir, json.dumps({'args':0}), "'args' must be an array.") | ||
__test_addon_json_invalid(tmpdir, json.dumps({'args':0}), "JSON schema validation failed: <root> [args] Value type not permitted by 'type' constraint., <root> Failed to validate against schema associated with property name 'args'., <root> Missing required property 'script'.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm.. in my humble opinion the old output is more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully agree, and think we can improve a lot on error reporting here. At this time various (redundant) issues are reported for this case:
- "script" key is missing
- "args" has the wrong type
- "args" failed to verify
- schema verification failed
I'm pretty sure this can be simplified to something way shorter, I'll look into it.
Yes, this seems to be a (de facto) standard, see https://json-schema.org/specification
They are as it stands now, but that is not a function of having schema validation. Rather it is a function of the validator.
I think so too. The project will be more robust as all embedded JSON is validated. And after validating, the code can safely assume all is according to spec, reducing the amount of checking and error handling/reporting code. |
Just some comment via looking at the implementation yet. The validation is definitely fine in any way of the static configs. In terms of the addon output we need to consider a potential performance impact compared to manual validation (which does not yet fully exist). It will surely be a drop into the ocean compared to what valueflow is doing but I still would like to have the fastest path possible for the actual core performance. But maybe in that case it should be (also) implemented within the addon so it will make sure that it will only output valid data. Some addons also output debug messages we need to suppress so maybe the whole output should be piped through a layer and disallowing all I won't dig too deep into it yet as the executor and language stuff as well as all other local changes are ahead of it on my list ... still. |
For this reason I didn't address that yet. To me a first logical step would seem to make the addon output one big JSON, not a set of JSON objects on separate lines.
That would be less efficient as Python is way slower than C++ in almost every respect (unless we find a Cythonized JSON validator). Maybe have one JSON object on stdout, and warnings/errors on stderr? This will require a rewrite of the code that executes processes, as the
I would like to keep the interface between addon and cppcheck as-is for now, and first finish what's on the table now. A few updates to this PR are underway... |
As source files might be several megabytes that doesn't seems like a good idea as that would balloon the memory usage. Especially since we already have several caches which store these (still working on getting rid of those).
We are not using a very fast JSON library on the C++ side either. Also vanilla Python is making major strides in reclaiming the performance lost since 2.7. That actually brings up an interesting point. We could pre-compile the addons for better performance.
Same. If we change it we need to provide backwards compatibility as people might be running their own addons. We should probably just introduce a new interface if we plan any changes. |
newline-separated-json-objects is actually an (informal) format, called JSONL. Sounds like something that is easy to formalize/verify/unittest in the interface between addons and cppcheck. Debugging information could still be emitted between objects without violating , e.g. as a quoted string or as an object with a defined structure. |
heads up: #5870 will change the naming of the pytest-based test files I still haven't looked at this. I currently am in the midst of finished up something else as will take a look afterwards. |
Thanks. There are some additional conflicts to solve, plus some additional improvements that I will push in the coming days. |
Sorry for the late reply. I hope to take a proper look this week. What about the additional changes you wanted to push? |
add test/cli/test-json.py: Schema validation for JSON files.
test-json.py uses jsonschema (https://pypi.org/project/jsonschema/) to validate JSON files against schemas, optionally with custom format checkers.
All JSON files in the project must be accounted for in test-json.schema_mapping(), either as a schema, or as a file to be validated against a schema. The JSON schemas are implicitly validated against the most recent metaschema available in jsonschema.
This commit adds schemas for JSON addon files and for namingng config files.
TODO:
runtime schema validationdoneremove current ad-hoc validationdoneFeedback is more than welcome.